Bayes By Backprop Neural Networks for Dialogue Management
نویسنده
چکیده
In dialogue management for statistical spoken dialogue systems, an agent learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful dialogue policy estimation. Current deep reinforcement learning methods are very promising but rely on ε-greedy exploration, which is not as sample efficient as methods that use uncertainty estimates, such as Gaussian Process SARSA (GPSARSA). This thesis implements Bayes-By-Backpropagation, a method to extract uncertainty estimates from deep Q-networks (DQN). These uncertainty estimates are used to guide exploration. We show that Bayes-ByBackpropagation DQN (BBQN) achieves more efficient exploration and faster convergence to an optimal policy than ε-greedy based methods, and reaches performance comparable to the state of the art in policy optimization, namely GPSARSA, especially when evaluated on more complex domains, and without the high computational complexity of Gaussian Processes. We also implement α-divergences, variational dropout, and minimizing the negative log likelihood as other means to extract uncertainty estimates from DQN, and compare performance to BBQN and DQN. This work is carried within in the Cambridge University Engineering Department dialogue systems toolkit, CUED-pydial.
منابع مشابه
Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation
In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on ε-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such...
متن کاملEfficient Exploration for Dialogue Policy Learning with BBQ Networks&Replay Buffer Spiking
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as greedy, Boltzmann exploration, and bootstrapping-based approaches. Additional...
متن کاملBBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as -greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additiona...
متن کاملWeight Uncertainty in Neural Networks
We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields...
متن کاملKickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks
Error backpropagation is an extremely effective algorithm for assigning credit in artificial neural networks. However, weight updates under Backprop depend on lengthy recursive computations and require separate output and error messages – features not shared by biological neurons, that are perhaps unnecessary. In this paper, we revisit Backprop and the credit assignment problem. We first decomp...
متن کامل